Search CORE

98 research outputs found

Toward More Predictive Models by Leveraging Multimodal Data

Author: Srinivasan Sudarshan
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 15/05/2020
Field of study

Data is often composed of structured and unstructured data. Both forms of data have information that can be exploited by machine learning models to increase their prediction performance on a task. However, integrating the features from both these data forms is a hard, complicated task. This is all the more true for models which operate on time-constraints. Time-constrained models are machine learning models that work on input where time causality has to be maintained such as predicting something in the future based on past data. Most previous work does not have a dedicated pipeline that is generalizable to different tasks and domains, especially under time-constraints. In this work, we present a systematic, domain-agnostic pipeline for integrating features from structured and unstructured data while maintaining time causality for building models. We focus on the healthcare and consumer market domain and perform experiments, preprocess data, and build models to demonstrate the generalizability of the pipeline. More specifically, we focus on the task of identifying patients who are at risk of an imminent ICU admission. We use our pipeline to solve this task and show how augmenting unstructured data with structured data improves model performance. We found that by combining structured and unstructured data we can get a performance improvement of up to 8.5

University of Tennessee, Knoxville: Trace

Recommended from our members

Dynamic Processor Reconfiguration for Power, Performance and Reliability Management

Author: Srinivasan Sudarshan
Publication venue: ScholarWorks@UMass Amherst
Publication date: 10/11/2016
Field of study

Technology advancements allowed more transistors to be packed in a smaller area, while the improved performance helped in achieving higher clock frequencies. This, unfortunately led to a power density problem, forcing processor industry to lower the clock frequency and integrate multiple cores on the same die. Depending on core characteristics, the multiple cores in the die could be symmetric or asymmetric. Asymmetric multi-core processors (AMPs) have been proposed as an alternative to symmetric multi-cores to improve power efficiency. AMPs comprise of cores that implement the same ISA, but differ in performance and power characteristics due to varying sizes of micro-architectural resources. As the computational bottleneck of a workload shifts from one resource to another during its course of execution, reassigning it to another core (where it runs more efficiently), can improve the overall power efficiency. Thus achieving high power efficiency in AMPs requires (i) a diverse set of cores that are optimized for various program phases, (ii) runtime analysis to determine the best core to run on, and (iii) low overhead of re-assigning a thread to a different core type. Decisions to swap threads between AMPs are made at coarse grain granularity of millions of instructions, to mitigate the impact of thread migration overhead. But the computational needs of the program rapidly change during the course of its execution. The best core configuration for an application such that, both power consumption and performance are optimized, changes over time rapidly at fine granularity of thousands of instructions. This dissertation explores ways to design core micro-architecture such that high power efficiency could be achieved, if switching overhead could be lowered, enabling fine grain switching. To take advantage of power saving opportunities at fine grain granularity, this thesis explores reconfigurable/morphable architectures where core resources are reconfigured on demand to suit the needs of the executing application. At first, we explore reconfigurable architectures consisting of two kinds of cores: out-of-order (OOO) big cores and in-order (InO) small cores. The big cores provide higher performance while the small cores are more power efficient. In this proposed architecture, OOO core reconfigures into InO core at run time. Our proposed online management scheme decides to switch between these core types such that we obtain significant power benefits without impacting performance. We also observe that, resource requirements of applications can be quite diverse and consequently, resource bottlenecks or excesses can vary considerably. Thus, reconfiguration between just two core modes may not fully exploit power and performance improvement opportunities. We therefore, explore reconfigurable architectures consisting of diverse core types that not limited to big and little cores. A single core can reconfigure into multiple core modes where each mode has unique power and performance characteristics. Workload performance on a particular core mode depends on a large set of processor resources. Some workloads are highly memory intensive, some exhibit large instruction dependency, some experience high rates of branch mis-prediction, while other workloads exhibit large exploitable instruction level parallelism. A diverse set of core modes is needed, that could address shifting resource needs during various program phases of an application. Different trade-offs in power and performance could be achieved by reducing or expanding the size of various resource. Trade-offs for each core mode are also affected by operating voltage and frequency. We therefore, propose joint core resource resizing with dynamic voltage and frequency scaling (DVFS), which is important for applications whose performance is sensitive to changes in frequency. Thus, at fine granularity, the core should adapt to varying instruction window sizes, execution bandwidth and frequency to meet the demands of the workload at run-time to improve power efficiency. Many current processors employ DVFS aggressively to improve power efficiency and maximize performance. This dissertation studies the tradeoff in power efficiency in using fine grain DVFS and reconfigurable architectures mentioned above.We also explore another important problem due to continued scaling of devices which results in higher vulnerability to soft-errors. We consider dynamic core reconfiguration from the perspectives of both power efficiency and vulnerability to soft-errors. An online management scheme is proposed such that core reconfiguration upon a thread switch not only improves power efficiency but also does not increase the vulnerability to soft errors. In summary, we propose in this thesis several solutions for improving power efficiency by integrating heterogeneity within the core. We also address how popular power reduction techniques like DVFS are comparable to our approach. Finally, we address reliability challenges along with improving power efficiency

ScholarWorks@UMass Amherst

Climate Resilient Concrete Structures in Marine Environment of Bangladesh

Author: Gibb Ian
Srinivasan Sudarshan
Publication venue: 'Purdue University (bepress)'
Publication date: 06/11/2019
Field of study

Bangladesh has a vast coastal infrastructure seriously affected by climate change and associated extreme environmental conditions. The rural construction sector in Bangladesh will be undergoing rapid growth in the next 10 years through rural infrastructure development programmes funded by the Asian Development Bank and the World Bank. The Local Government Engineering Department (LGED in Bangladesh), owns the rural concrete infrastructure, maintains around 380, 000 linear metres of concrete bridges or culverts in the rural coastal areas and are planning to build more than 200,000 linear metres during the next ten years. In order to design and construct durable concrete structures to withstand the aggressive coastal environment for the intended design life, there is a need to study the local factors that influence the durability of reinforced concrete structures. This paper reports on the findings of a research programme, funded by DfID, to identify the major factors that contribute to premature deterioration of concrete structures, consider future climate change and identify solutions to improve the durability of coastal concrete structures in Bangladesh. A condition survey undertaken for the project of bridges in the coastal districts indicated that the concrete structures were deteriorating rapidly (within 5-10 years of construction) due to exposure to aggressive marine environment, issues related to poor workmanship, limited availability of good quality materials and lack of awareness on good construction practices. The paper also reports on the outcome of an experimental investigation on the performance of local materials aimed at developing concrete mixes which will provide enhanced durability in future concrete structures

Purdue E-Pubs

Rotational Abstractions for Verification of Quantum Fourier Transform Circuits

Author: Govindankutty Arun
Mathure Nimish
Srinivasan Sudarshan K.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 02/01/2023
Field of study

With the race to build large-scale quantum computers and efforts to exploit quantum algorithms for efficient problem solving in science and engineering disciplines, the requirement to have efficient and scalable verification methods are of vital importance. We propose a novel formal verification method that is targeted at Quantum Fourier Transform (QFT) circuits. QFT is a fundamental quantum algorithm that forms the basis of many quantum computing applications. The verification method employs abstractions of quantum gates used in QFT that leads to a reduction of the verification problem from Hilbert space to the quantifier free logic of bit-vectors. Very efficient decision procedures are available to reason about bit-vectors. Therefore, our method is able to scale up to the verification of QFT circuits with 10,000 qubits and 50 million quantum gates, providing a meteoric advance in the size of QFT circuits thus far verified using formal verification methods

arXiv.org e-Print Archive

Directory of Open Access Journals

Efficient Communication Acceleration for Next-Gen Scale-up Deep Learning Training Platforms

Author: Denton Matthew
Krishna Tushar
Rashidi Saeed
Sridharan Srinivas
Srinivasan Sudarshan
Publication venue
Publication date: 08/07/2020
Field of study

Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators (e.g., GPU/TPU) via fast, customized interconnects. As the size of DL models and the compute efficiency of the accelerators has continued to increase, there has also been a corresponding steady increase in the bandwidth of these interconnects.Systems today provide 100s of gigabytes (GBs) of inter-connect bandwidth via a mix of solutions such as Multi-Chip packaging modules (MCM) and proprietary interconnects(e.g., NVlink) that together from the scale-up network of accelerators. However, as we identify in this work, a significant portion of this bandwidth goes under-utilized. This is because(i) using compute cores for executing collective operations such as all-reduce decreases overall compute efficiency, and(ii) there is memory bandwidth contention between the accesses for arithmetic operations vs those for collectives, and(iii) there are significant internal bus congestions that increase the latency of communication operations. To address this challenge, we propose a novel microarchitecture, calledAccelerator Collectives Engine(ACE), forDL collective communication offload. ACE is a smart net-work interface (NIC) tuned to cope with the high-bandwidth and low latency requirements of scale-up networks and is able to efficiently drive the various scale-up network systems(e.g. switch-based or point-to-point topologies). We evaluate the benefits of the ACE with micro-benchmarks (e.g. single collective performance) and popular DL models using an end-to-end DL training simulator. For modern DL workloads, ACE on average increases the net-work bandwidth utilization by 1.97X, resulting in 2.71X and 1.44X speedup in iteration time for ResNet-50 and GNMT, respectively

arXiv.org e-Print Archive

Adaptive Global Carbon Monoxide Kinetic Mechanism over Platinum/Alumina Catalysts

Author: Depcik Christopher
Loya Sudarshan K.
Srinivasan Anand
Wentworth Travis
Williams Susan Michelle
Publication venue: 'MDPI AG'
Publication date: 27/10/2015
Field of study

Carbon monoxide (CO) oxidation is one of the more widely researched mechanisms given its pertinence across many industrial platforms. Because of this, ample information exists as to the detailed reaction steps in its mechanism. While detailed kinetic mechanisms are more accurate and can be written as a function of catalytic material on the surface, global mechanisms are more widely used because of their computational efficiency advantage. This paper merges the theory behind detailed kinetics into a global kinetic model for the singular CO oxidation reaction while formulating expressions that adapt to catalyst properties on the surface such as dispersion and precious metal loading. Results illustrate that the model is able to predict the light-off and extinction temperatures during a hysteresis experiment as a function of different inlet CO concentrations and precious metal dispersion

KU ScholarWorks

TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Training

Author: Durg Ajaya
Elavazhagan Midhilesh
Gupta Swati
Krishna Tushar
Srinivasan Sudarshan
Won William
Publication venue
Publication date: 11/04/2023
Field of study

Collective communications are an indispensable part of distributed training. Running a topology-aware collective algorithm is crucial for optimizing communication performance by minimizing congestion. Today such algorithms only exist for a small set of simple topologies, limiting the topologies employed in training clusters and handling irregular topologies due to network failures. In this paper, we propose TACOS, an automated topology-aware collective synthesizer for arbitrary input network topologies. TACOS synthesized 3.73x faster All-Reduce algorithm over baselines, and synthesized collective algorithms for 512-NPU system in just 6.1 minutes

arXiv.org e-Print Archive